Skip to content

Conversation

@calvinp0
Copy link
Member

Addition of CREST Adapter that complements the heuristic adapter.

Fixed Arkane to check LoT by ignoring the year suffix (plans to implement year in the input)
AutoTST does not attempt to import autotst (due to its reliance on RMG-Py env).
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows. Several installation scripts were improved with better environment management, PATH/PYTHONPATH handling, and more robust activation/deactivation hooks. The Arkane module gained flexible model chemistry matching that handles hyphenated method names and year suffixes.

Key Changes:

  • New CREST adapter module (arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow
  • Enhanced Arkane model chemistry lookup with normalization and year-aware matching for RMG quantum corrections database
  • Improved installation scripts with better conda environment isolation and conflict resolution (AutoTST/RMG-Py, GCN)

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
arc/job/adapters/ts/crest.py New CREST adapter with job setup, submission, monitoring, and H-abstraction atom detection
arc/job/adapters/ts/heuristics.py Integrates CREST TS search alongside heuristics, submits jobs and processes results
arc/job/adapters/ts/crest_test.py Unit tests for CREST input file generation
arc/statmech/arkane.py Adds flexible matching for LevelOfTheory strings, normalizing hyphens and handling year suffixes
arc/species/converter.py New reorder_xyz_string function for coordinate format/unit conversion with deprecation wrapper
arc/species/converter_test.py Tests for the new XYZ reordering function
arc/settings/settings.py CREST executable discovery across conda/mamba/micromamba environments and standalone builds
devtools/install_crest.sh New installer for CREST 2.12 via conda
devtools/install_autotst.sh Enhanced PATH management to isolate AutoTST from RMG-Py conflicts
devtools/install_rmg.sh Improved PATH/PYTHONPATH additions with duplicate prevention
devtools/install_gcn.sh Fixed activation hooks and switched to conda run for pip installs
devtools/install_pyrdl.sh Corrected cmake installation to target arc_env explicitly
devtools/install_torchani.sh Added LOGFILE variable for cleaner log handling
devtools/install_all.sh Added --no-rmg flag, reordered PyRDL after ARC, added CREST to external tools
devtools/crest_environment.yml Conda environment specification for CREST 2.12
Makefile Added install-crest target
AGENTS.md Repository guidelines file (added but also gitignored - should be removed)
.gitignore Ignores AGENTS.md
arc/species/species.py Added logging for similar TS guesses during clustering
arc/job/adapters/ts/autotst_ts.py Better error handling for missing AutoTST package in subprocess
Comments suppressed due to low confidence (1)

arc/job/adapters/ts/heuristics.py:289

  • When CREST generates TS guesses, they will have different timing information than the heuristics guesses, but all guesses (both Heuristics and CREST) are being assigned tsg.t0 and tsg.execution_time from the heuristics TSGuess object. CREST guesses should have their own timing information, or at minimum the execution_time should reflect the actual CREST job duration, not the heuristics generation time.
                                       t0=tsg.t0,
                                       execution_time=tsg.execution_time,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 306 to 310
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.

Suggested change
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
atom for atom in atom_neighbours if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in atom_neighbours

Copilot uses AI. Check for mistakes.
"H": extract_digits(best_occurrence["H"]),
"A": extract_digits(best_occurrence["A"]),
"B": extract_digits(best_occurrence["B"]),
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.

Suggested change
}
}
elif len(condition_occurrences) == 1:
occurrence = condition_occurrences[0]
return {
"H": extract_digits(occurrence["H"]),
"A": extract_digits(occurrence["A"]),
"B": extract_digits(occurrence["B"]),
}

Copilot uses AI. Check for mistakes.
Args:
xyz_str (str): The string xyz format to be converted.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').

Suggested change
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr').

Copilot uses AI. Check for mistakes.
}
else:

# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "Hyodrogen" should be "Hydrogen".

Suggested change
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate

Copilot uses AI. Check for mistakes.
Comment on lines 143 to 148
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.

Suggested change
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"

Copilot uses AI. Check for mistakes.
distances = dataframe.loc[hydrogen_key, heavy_atoms].sum()
occurrence_distances.append((occurrence, distances))
except KeyError as e:
print(f"Error accessing distances for occurrence {occurrence}: {e}")
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.

Suggested change
print(f"Error accessing distances for occurrence {occurrence}: {e}")
logger.error(f"Error accessing distances for occurrence {occurrence}: {e}")

Copilot uses AI. Check for mistakes.

def _iter_level_keys_from_section(file_path: str,
section_start: str,
section_end: str) -> list[str]:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.

Suggested change
section_end: str) -> list[str]:
section_end: str) -> List[str]:

Copilot uses AI. Check for mistakes.
return method.lower().replace('-', '')


def _split_method_year(method_norm: str):
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 27
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copilot uses AI. Check for mistakes.

import datetime
import itertools
import os
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
else:
raise ValueError("No valid hydrogen atom found.")

return {}

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.

Copilot Autofix

AI about 2 months ago

The best fix is to delete the final unreachable return {} at line 378 in the get_h_abs_atoms function in arc/job/adapters/ts/crest.py. All execution paths in the function already either return with a dictionary or raise an exception. Removing this line does not alter any functionality, as it cannot be executed under any code path. No imports, method or variable definitions, or other changes are needed—just removal of this single line.


Suggested changeset 1
arc/job/adapters/ts/crest.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/job/adapters/ts/crest.py b/arc/job/adapters/ts/crest.py
--- a/arc/job/adapters/ts/crest.py
+++ b/arc/job/adapters/ts/crest.py
@@ -375,4 +375,3 @@
         else:
             raise ValueError("No valid hydrogen atom found.")
 
-    return {}
EOF
@@ -375,4 +375,3 @@
else:
raise ValueError("No valid hydrogen atom found.")

return {}
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 40 to +44
from arc.species.converter import (compare_zmats, relocate_zmat_dummy_atoms_to_the_end, zmat_from_xyz, zmat_to_xyz,
add_atom_to_xyz_using_internal_coords, sorted_distances_of_atom)
from arc.species.converter import (
compare_zmats,
relocate_zmat_dummy_atoms_to_the_end,
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import blocks found. Lines 30-38 import from arc.job.adapters.ts.crest, while lines 42-48 import from arc.species.converter. The second converter import block at lines 42-48 duplicates some imports already present earlier in the file (lines 40-41). Consider consolidating these imports to avoid confusion.

Suggested change
from arc.species.converter import (compare_zmats, relocate_zmat_dummy_atoms_to_the_end, zmat_from_xyz, zmat_to_xyz,
add_atom_to_xyz_using_internal_coords, sorted_distances_of_atom)
from arc.species.converter import (
compare_zmats,
relocate_zmat_dummy_atoms_to_the_end,
from arc.species.converter import (
add_atom_to_xyz_using_internal_coords,
compare_zmats,
relocate_zmat_dummy_atoms_to_the_end,
sorted_distances_of_atom,

Copilot uses AI. Check for mistakes.
# --- build CREST command string ---
# Example: crest coords.ref --cinp constraints.inp --noreftopo -T 8
local_server = SERVERS.get("local", {})
cpus = int(local_server.get("cpus", 8))
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable cpus is defined but then not used consistently. Line 109 re-fetches the same value using local_server.get('cpus', 8) instead of using the cpus variable. Consider using the cpus variable for consistency.

Copilot uses AI. Check for mistakes.
all_done = False
if all_done:
break
time.sleep(min(check_interval, 100))
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded value 100 (seconds) in time.sleep appears to be a magic number without explanation. Consider extracting this as a named constant (e.g., MAX_CHECK_INTERVAL = 100) to clarify its purpose.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +80
if project_directory is not None and os.path.isfile(os.path.join(project_directory, xyz_str)):
xyz_str = os.path.join(project_directory, xyz_str)

Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic appears to handle file path resolution, but it's unclear and potentially incorrect. If xyz_str is already a valid file path in project_directory, it gets reassigned to the full path, but then this full path is used as if it's XYZ string content in subsequent processing. This seems like it would fail when trying to parse the file path as XYZ coordinates. Consider clarifying the intended behavior or removing this if it's not needed.

Suggested change
if project_directory is not None and os.path.isfile(os.path.join(project_directory, xyz_str)):
xyz_str = os.path.join(project_directory, xyz_str)
if project_directory is not None:
file_path = os.path.join(project_directory, xyz_str)
if os.path.isfile(file_path):
with open(file_path, 'r') as f:
xyz_str = f.read()

Copilot uses AI. Check for mistakes.
Comment on lines +908 to +909
crest_paths = list()
all_zmats = list()
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name crest_paths is ambiguous - it actually stores CREST job directories, not just paths. Consider renaming to crest_job_dirs or crest_job_paths for clarity.

Copilot uses AI. Check for mistakes.
from arc.level import Level


AUTOTST_PYTHON = settings['AUTOTST_PYTHON']
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for setting HAS_AUTOTST is incomplete. Line 22 sets HAS_AUTOTST = True unconditionally, then lines 29-30 may set it to False if AUTOTST_PYTHON is None. However, there's no check if AUTOTST_PYTHON is defined in settings. If settings doesn't have 'AUTOTST_PYTHON', this will raise a KeyError. Consider wrapping AUTOTST_PYTHON retrieval in a try-except block or using settings.get('AUTOTST_PYTHON').

Suggested change
AUTOTST_PYTHON = settings['AUTOTST_PYTHON']
AUTOTST_PYTHON = settings.get('AUTOTST_PYTHON')

Copilot uses AI. Check for mistakes.

# --- activation hook -----------------------------------------------------
cat <<'ACTHOOK' >"$act"
cat <<ACTHOOK >"$act"
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heredoc delimiter ACTHOOK changed from single-quoted 'ACTHOOK' to unquoted ACTHOOK. This changes behavior - the unquoted version will perform variable expansion inside the heredoc. While this appears intentional to use the $repo variable, ensure this is the desired behavior and that all variables within the heredoc should be expanded at write-time rather than read-time.

Copilot uses AI. Check for mistakes.
"""
Backwards compatible wrapper for reorder_xyz_string.
"""
warnings.warn("str_to_str was renamed to reorder_xyz_string", DeprecationWarning)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation warning should specify a version or timeline for removal, and should use stacklevel=2 to ensure the warning points to the caller's code rather than this wrapper. Consider: warnings.warn('str_to_str was renamed to reorder_xyz_string and will be removed in version X.Y', DeprecationWarning, stacklevel=2)

Suggested change
warnings.warn("str_to_str was renamed to reorder_xyz_string", DeprecationWarning)
warnings.warn(
"str_to_str was renamed to reorder_xyz_string and will be removed in a future ARC release",
DeprecationWarning,
stacklevel=2,
)

Copilot uses AI. Check for mistakes.
Comment on lines +995 to +1000
xyz = str_to_xyz(xyz_guess_crest)
df_dmat = convert_xyz_to_df(xyz)
elif isinstance(xyz_guess_crest, list):
xyz_temp = "\n".join(xyz_guess_crest)
xyz_to_dmat = str_to_xyz(xyz_temp)
df_dmat = convert_xyz_to_df(xyz_to_dmat)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable naming is inconsistent and confusing. Line 995 uses 'xyz', line 999 uses 'xyz_to_dmat' for the same purpose (result of str_to_xyz). The name 'xyz_to_dmat' is particularly misleading as it sounds like a function name. Consider using consistent naming like 'xyz_dict' for all str_to_xyz results.

Suggested change
xyz = str_to_xyz(xyz_guess_crest)
df_dmat = convert_xyz_to_df(xyz)
elif isinstance(xyz_guess_crest, list):
xyz_temp = "\n".join(xyz_guess_crest)
xyz_to_dmat = str_to_xyz(xyz_temp)
df_dmat = convert_xyz_to_df(xyz_to_dmat)
xyz_dict = str_to_xyz(xyz_guess_crest)
df_dmat = convert_xyz_to_df(xyz_dict)
elif isinstance(xyz_guess_crest, list):
xyz_temp = "\n".join(xyz_guess_crest)
xyz_dict = str_to_xyz(xyz_temp)
df_dmat = convert_xyz_to_df(xyz_dict)

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +378

return {}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unreachable.

Suggested change
return {}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants